Date: Sat, 25 Jun 2011 15:58:33 -0700 From: perryh@pluto.rain.com To: freebsd-drivers@freebsd.org Subject: fatal ata WRITE_DMA48 UDMA ICRC errors Message-ID: <4e066819.DRprHaL0TvBGL6Jl%perryh@pluto.rain.com>
next in thread | raw e-mail | index | archive | help
Once in a while, on a recently-installed 8.1-RELEASE, I get a sequence like this (reformatted): Jun 25 15:55:30 fbsd81 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=615769530 Jun 25 15:55:30 fbsd81 kernel: ad8: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=615769530 Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Request failed (error=5). ad8s2a[WRITE(offset=315265765888, length=78336)] Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Device gm0: provider ad8s2a disconnected. The sequence is consistent: a retried WRITE_DMA48 UDMA ICRC error on ad8, a WRITE_DMA48 "FAILURE" on the same LBA with status=51 and error=4, a gmirror "Request failed (error=5)", and a disconnect. The LBA, offset, and length vary from one instance to another. The retry seems to succeed most of the time -- the "WARNING - WRITE_DMA48 UDMA ICRC error" message most often is not closely followed by anything else -- but it is immediately followed by a failure with status=51 and error=4 frequently enough to be a significant problem (since it breaks the mirror). The cable between the controller and the drive has been a factor -- the errors became much more frequent the first time I replaced it -- but I'm still getting occasional errors even with a brand-new cable. I doubt there is anything wrong with the (nearly new) drive, because I am not having any trouble at all with an identical drive connected to the onboard ata controller as ad0, but I wonder if there may be known issues with the VIA-based PCI card that provides two SATA ports along with the ad8 ATA port. (Nothing is connected as ad9, and I haven't yet tried to use either of the SATA devices.) I've asked on geom@ about the possibility of making gmirror more robust to this sort of event, but the better solution would be to improve the handling at the hardware or ata driver level. What would cause the ad8 driver to sometimes return a FAILURE indication after a single retryable error? Would it make sense to treat this indication (with status=51 and error=4) as retryable? Relevant parts of dmesg: pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <PCI-PCI bridge> at device 1.0 on pci0 pci1: <PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci2: <ACPI PCI bus> on pcib2 atapci0: <VIA 6421 SATA150 controller> port 0xdc70-0xdc7f,0xdc50-0xdc5f,0xdc30-0xdc3f, 0xdc10-0xdc1f,0xd8e0-0xd8ff,0xd400-0xd4ff irq 19 at device 11.0 on pci2 atapci0: [ITHREAD] ata2: <ATA channel 0> on atapci0 ata2: [ITHREAD] ata3: <ATA channel 1> on atapci0 ata3: [ITHREAD] ata4: <ATA channel 2> on atapci0 ata4: [ITHREAD] pcib3: <PCI-PCI bridge> at device 14.0 on pci2 pci3: <PCI bus> on pcib3 atapci1: <Intel ICH UDMA66 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 ata0: <ATA channel 0> on atapci1 ata0: [ITHREAD] ata1: <ATA channel 1> on atapci1 ata1: [ITHREAD] ad0: 305245MB <Hitachi HDT725032VLAT80 V54OA4NA> at ata0-master UDMA66 ad1: 32253MB <MAXTOR 6L040L2 A93.0500> at ata0-slave UDMA66 acd0: <Lite-On LTN483S 48x Max/PD02> CDROM drive at ata1 as slave ad4: 61136MB <PATRIOT MEMORY 64GB SSD 02.10104> at ata2-master UDMA100 SATA 1.5Gb/s acd1: <PIONEER DVD-RW DVR-212D/1.24> DVDR drive at ata3 as master ad8: 305245MB <Hitachi HDT725032VLAT80 V54OA4NA> at ata4-master UDMA133
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4e066819.DRprHaL0TvBGL6Jl%perryh>